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1 Introduction 

In [^] we have shown that the standard definition of composi- 
tionahty is formally vacuous; that is, any semantics can be easily 
encoded as a compositional semantics. We have also shown that 
when compositional semantics is required to be "systematic", it 
is possible to introduce a non- vacuous concept of compositional- 
ity. However, a technical definition of systematicity was not given 
in that paper; only examples of systematic and non-systematic 
semantics were presented. As a result, although our paper clari- 
fied the concept of compositionality, it did not solve the problem 
of the systematic assignment of meanings. In other words, we 
have shown that the concept of compositionality is vacuous, but 
we have not replaced it with a better definition; a definition that 
would both be mathematically correct and would satisfy the com- 
mon intuitions that there are parts of grammars which seem to 
have compositional semantics, and others, like idioms, that do 
not. We present such a non- vacuous definition of compositionality 
in this chapter. 

Compositionality has been defined as the property that the 
meaning of a whole is a function of the meaning of its parts (cf. 
e.g. pp. 24-25). A slightly less general definition, e.g. 
postulates the existence of a homomorphism from syntax to se- 
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mantics. Although intuitively clear, these definitions are not re- 
strictive enough. The fact that any semantics can be encoded as a 
compositional semantics has some strange consequences. We can 
find, for example, an assignment of meanings to phonems, or even 
the letters of the alphabet (as the cabalists wanted), and assure 
that the normal, intuitive, meaning of any sentence is a function of 
the meanings of the phonems or letters from which that sentence 
is composed (cf. ]T2[). 

To address these kind of problems we have several options. We 
can: 

(a) Avoid admitting that there is a problem (e.g. by claiming that 
compositionality was never intended to be expressible in mathe- 
matical terms); 

(b) Add additional constraints on the shape or behavior of mean- 
ing functions (e.g. that they are "polynomial", preserve entail- 
ment, etc.); 

(c) Re-analyze the concept of compositionality, and the associated 
intuitions. That is, that the meaning of a sentence is derived in a 
systematic way from the meanings of the parts; that the meanings 
of the parts have some intuitive simplicity associated with them; 
and that compositionality is a gradeable property, i.e. one way of 
building compositional semantics might be better than another. 

We will follow course (c). The emphasis will be on simplicity, 
but the development of ideas will be formal. (The mathematics 
will be relatively simple). The bottom line will be that compo- 
sitional semantics can be defined as the simplest semantics that 
obeys the compositionality principle. 

2 Basic concepts and notations 

In this section we discuss the issues in representing linguistic infor- 
mation, i.e. the relationship between languages and their models. 
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The first, and the simplest case to discuss is when natural lan- 
guage is treated as set of words; then, the simplest formal model 
of a natural language corpus can be the corpus itself. A more 
complicated model would be a grammar generating the sentences 
of the corpus; this model is better because it is more compact. 

A more interesting case arises when some semantics for the 
corpus is given. Then, representations become less obvious, and 
more complicated. Thus to keep the complexity of our presenta- 
tion under control, we will discuss only very simple cases of nat- 
ural language constructions. This should be enough to show how 
to define and build compositional semantics for small language 
fragments. 

Although our methods do not depend on the size and shape 
of the corpora, we would like to point out that computing compo- 
sitional semantics for a large and real corpus of natural language 
sentences would require a separate research project, and certainly 
goes beyond the the aims of this chapter. 

The following issues will now be discussed: (1) representing 
corpora of sentences using grammars; (2) representing meaning 
functions; (3) the size and expressive power of representations. 

2.1 Notation and essential concepts 

2.1.1 Sentences, grammars, and meanings 

A corpus is an unordered set (bag) of sentences; a sentence is a 
sequence of symbols from some alphabet. 

A class is a set of sequences of symbols from the alphabet. In 
our notation, {a|5|ac} denotes a class consisting of a,b, and ac. 

The length of an expression is the number of its symbols. To 
make our computations simpler, we will assume that all symbols of 
the alphabet are atomic, and hence of length 1; same for variables. 
Parentheses, commas, and most of the other notational devices 
{,},!,"," ... also all have length 1; but we will not count semicolons 
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which we will occasionally use as a typographical device standing 
for "end of line". In several cases, we will give the length (in 
parentheses) together with an expression, e.g. {a|6|ac} (8). 

We define a (finite state) grammar rule as a sequence of classes. 
E.g. the rule {a|6}{c|(i} describes all the combinations ac, ad, be, bd. 
We will go beyond finite state grammars when we discuss compo- 
sitional semantics, and we introduce an extension of this notation 
then. 

The reader should always remember that, mathematically, a 
function is defined as a set of pairs [argument, value]. Thus, a 
function does not have to be given by a formula. A formula is not 
a function, although it might define one: e.g. a description of one 
entity, like energy, depending on another, e.g. velocity, is typically 
given as a formula, which defines a function (a set of pairs). 

A meaning function is a (possibly partial) function that maps 
sentences (and their parts) into (a representation of) their mean- 
ings; typically, some set-theoretic objects like lists of features or 
functions. A meaning function /x is compositional if for all elements 
in its domain: 

H{s.t) = n{s) e n{t) 

We are restricting our interest to two argument functions: . de- 
notes the concatenation of symbols, and © is a function of two 
arguments. However, the same concept can be defined if expres- 
sions are put together by other, not necessarily binary, operations. 
In literature, © is often taken as a composition of functions; but 
in this chapter it will mostly be used as an operator for construct- 
ing a list, where some new attributes are added to //(s) and //(t). 
This has the advantage of being both conceptually simpler (no 
need for type raising) , and closer to the practice of computational 
linguistics. 
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2.1.2 Minimum description length 

The minimum description length (MDL) principle was proposed 
by Rissanen |1C]. It states that the best theory to explain a set of 
data is the one which minimizes the sum of 

• the length, in bits, of the description of the theory, and 

• the length, in bits, of data when encoded with the help of 
the theory. 

In our case, the data is the language we want to describe, 
and the the encoding theory is its grammar (which includes the 
lexicon). The MDL principle justifies the intuition that a more 
compact grammatical description is better. At issue is what is 
the best encoding. To address it, we will be simply comparing 
classes of encodings. The formal side of the argument will be kept 
to the minimum; and the mathematics will be simple — counting 
symbols^. Counting symbols instead of bits does not change the 
line of MDL arguments, given an alternative formulation of the 
MDL principle: (p.310 of §): 

"Given a hypothesis space H, we want to select the hypoth- 
esis H such that the length of the shortest encoding of D [i.e. 
the data] together with the hypothesis H is minimal. "In differ- 
ent applications, the hypothesis H can be about different things. 
For example, decision trees, finite automata, Boolean formulas, or 
polynomials." 



The important aspect of the MDL method has to do with 
the fact that this complexity measure is invariant with respect 
to the representation language (because of the invariance of the 
Kolmogorov complexity on which it is based). The existence of 
such invariant complexity measures is not obvious; for example, 

^ We assume that the corpus contains no errors (noise), so we do not have 
to worry about defining prior distributions. 
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H.Simon (in [11|, p. 228), wrote "How complex or simple a struc- 
ture is depends critically upon the way in which we describe it. 
Most of the complex structures found in the world are enormously 
redundant, and we can use this redundancy to simplify their de- 
scription. But to use it, to achieve this simplification, we must 
find the right representation". 



2.2 Encoding a corpus of sentences 

Assume that we are given a text in an unknown language (con- 
taining lower and uppercase letters and numbers): 

XaO + Yd + XbO + XcO + YaO + YbO 

(We use the pluses to separate utterances, so there is no order 
implied.) We are interested in building a grammar describing the 
text. For a short text, the simplest grammar might in fact be the 
grammar consisting of the list of all valid sentences: 

{XaO\Ycl\XbO\XcO\YaO\YbO} 

This grammar has only 25 symbols. However, if a new corpus is 
presented 

ZaO + WcO + ZbO + ZcO + WaO + WbO 

The listing grammar would have 49 symbols, and a shorter gram- 
mar, with only 39 symbols, could be found: 

{X\Y\Z\W}{a\b}{0} (17) 
{Y}{c}{l} (9) 
{X\Z\W}{c:}{0} (13) 



2.3 How to encode semantics? 

We will now examine a similar example that includes some simple 
semantics. 
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Consider a set of nouns nj, i G 1..99 and a set of verbs vj, 
j G 1..9. Let vq be kick and no be bucket; and all other noun- 
verb combinations are intended to have normal, "compositional" 
meanings. If our corpus were to be the 10 x 100 table consisting 
of all verb-noun combinations: 

vqUo + Vino + ... + VjUi + ... 

we could quickly use the previous example to write a simple finite 
state grammar that describes the corpus: 

{vo\vi\...}{no\ni...} (21 + 201) 

But in this subsection we are supposed to introduce some seman- 
tics. Thus, let our corpus consist of all those 1,000 sentences to- 
gether with their meanings, which, to keep things as simple as 
possible, will be simplified to two attributes. Also, for the reason 
of simplicity, we assume that only "kick bucket" has an idiomatic 
meaning, and all other entries are assigned the meaning consisting 
of the two attribute expression [[action, Vj], [object, ni]]. Hence, 
our corpus will look as follows: 

kick bucket action die object nil 
vi bucket action vi object bucket 

Vj ni action vj object ni 

Now, notice that this corpus cannot be encoded by means of a 
short finite state grammar, because of the dependence of the mean- 
ings (i.e. the pair [action, object, ...]) on the first two elements 
of each sentence. We will have to extend our grammar formalism 
to address this dependence (Section 3). 
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2.4 On meaning functions 

Even though we cannot encode the corpus by a short, finite state 
grammar, we can easily provide for it a compositional semantics. 
To avoid the complications of type raising, we will build a homo- 
morphic mapping from syntax to semantics. To do it, it is enough 
to build meaning functions in a manner ensuring that the meaning 
of each vjUi is composed from the meaning of vj and the mean- 
ing of rij. Since our corpus is simple, these meaning functions are 
simple, too: For the verbs the meaning function is given by the 
table: 

[vo,[verb,VQ]]; [vi,[verb,vi]] ... [v9,[verb,vg]] (90) 

For the nouns: 

[no, [noun, no]]; [ni, [noun, ni]] ... [ngg, [noun, ngg]] (900) 

We have represented both meaning functions as tables of sym- 
bols. Since this chapter deals with sizes of objects, we compute 
them for the meaning functions: the size of the first function is 
90 = 10 X 9, and for the second one it is 900 = 100 x 9. Therefore, 
the meaning function for the whole corpus could be represented 
as a table with 1,000 entries: 

[[[verb, vg], [noun, ngg]], [[action, vq], [object, ngg]]] 

[[[i;er6, vj], [noun, n^]], [[action, Vj], [object, n^]]] 

[[[?;er6, vi], [noun, bucket]], [[action, vi], [object, bucket]]] 
[[[verb, kick], [noun, bucket]], [[action, die], [object, nil]] 

and the size of this table is 29 x 1000. Finally, the total size of the 
tables that describe the compositional interpretation of the corpus 
is 29000 + 900 + 90, i.e. roughly 30, 000. Notice that if we had 
more verbs and nouns, the tables describing the meaning functions 
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would be even larger]^ Also, note that we have not counted the 
cost of encoding the positions of elements of the table, which would 
be the log of the total number of symbols in the table. This 
simplifying assumption does not change anything in the strength 
of our arguments (as larger tables have longer encodings). 

3 Compositional semantics through the Min- 
imum Description Length principle 

In this section we first extend our notation to deal with semantic 
grammars. Then we apply the minimum description length princi- 
ple to construct a compact representation of our example corpus. 
This experience will motivate our new, non-vacuous definition of 
the notion of compositional semantics given in Section 4. 

3.1 Representations 

We have seen that it is impossible to efficiently encode our se- 
mantic corpus using a finite state grammar. Therefore, we have 
to make our representation of grammars more expressive (at the 
price of a slightly bigger interpreter). Namely, we will allow a sim- 
ple form of unification. 



The reader familiar with should notice that the meaning functions 
obtained by the solution lemma also consist of tables of element-value pairs. It 
is easy to see that for the corpus we are encoding the solution lemma produces 
the same meaning functions. 

In the other direction, the method for deriving compositional semantics us- 
ing the minimum description length principle (Sections 3 and 4) are directly 
applicable to meaning functions obtained by the solution lemma in [Q, pro- 
vided they are finite (which covers the practically interesting cases); and it 
seems applicable to the infinite case, if it has a finite representation. However, 
we will not pursue this connection any further. 
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Example. Assume we do not want {a|6}{a|6|(i} to generate ab. 
We can do it by changing the notation: 

X = {a\b} 
{X}{X\d} 

The intention is simple: first, we define a class variable {X) for the 
class consisting of elements a and 6; then, we generate all strings 
using the rule with variable X: XX and Xd; and finally we substi- 
tute for X all its possible values, which produces aa, ad, bb and bd. 



More generally, let us assume that we have an alphabet ai, 02, 
and a set of (class) variables Xi, X2, .... A grammar term, denoted 
by ti, is either a sequence of symbols from the alphabet or a class 
variable. By a grammar rule we will understand one of the three 
expressions 

Xm = {Xj} ... {^n} 
= {ti\---\t)~} 

Xm = Xi 

A grammar is a collection of grammar rules. The language gener- 
ated by the grammar is defined as above. 

Thus, new classes are obtained from elements of the alphabet 
by either the merge operation, which on two classes X and Y 
produces a new class Cxy consisting of the set theoretic union of 
the two: Cxy = {^|^}; or by concatenating elements of two or 
more classes. We permit renaming of classes, because we want to 
be able to express constructions like Nouriperson know Nouriperson- 

Nl = Nouriperson 
N2 = Nouriperson 

{Nl}{know}{N2} 
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3.2 An MDL algorithm for encoding semantic cor- 
pora 

In [Q] a greedy algorithm for clustering elements into classes is 
presented. The algorithm is trying to minimize the description 
length of grammars according to the MDL principle. This algo- 
rithm would not work properly on our semantic corpus, because 
Grunwald's representation language is not expressive enough. How- 
ever, the representation of grammars we introduced above allows 
us to use the same algorithm with only minor changes. 

The basic steps of the greedy MDL algorithm are as follows: 

1. Assign separate class {w} to each different word (symbol) in 
the corpus. Substitute the class for each word in the corpus. 
This is the initial grammar G. 

2. Compute the total description length (DL) of the corpus. 
(I.e. the sum of the DL of the corpus given G and the DL 
of G). 

3. Compute for all pairs of classes in G the difference in in DL 
that would result from a merge of these two classes. 

4. Compute for all pairs of classes Cj, Cj in G the difference in 
in DL that would result from a construction of a new class 
given by the concatenation rules 

X = {Gi}{G,} 

5. If there is one or more operations that result in a smaller 
new DL, perform the operation that produces the smallest 
new DL, and go to Step 2. 

6. Else Stop 

^There is no guarantee that the algorithm wiU produce the minimum length 
description. 
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3.3 Applying the MDL algorithm to encode a se- 
mantic corpus 

We will now show how the algorithm applies to our corpus of 1,000 
sentences. By Step 1, the initial grammar Go looks as follows: 

Initial grammar Gq: 

{kick} {bucket} {action} {die} {object} {nil} 
{vi }{bucket}{action}{vi } {object} {bucket} 

{vj}{ni} {action} {vj}{object}{ni} 

Step 2. Computing the total length: The grammar describes 
the corpus. The size is of the initial grammar is 18,000 symbols 
(not counting the encoding of the positions of beginnings of each 
rule). For all the grammars obtained by the steps of the algorithm, 
the total length will be the size of the grammar plus the size of 
the machine that generates languages from grammars. But, since 
the size of this machine is constant, we can remove it from our 
considerations. 

Step 3. Merging. Consider the merge operation for two nouns, 

and the new class N^i = {n^lri/} (7), k,l > 0. The resulting new 
description of the corpus is shorter since it removes 20 entries with 
Uk, ni of total length 360, and adds two entries of total length 25 

{vi }{]^ki } {action} {vi }{object}{Nki } 
Nki = {nk\ni} 

However, the merge operation for two verbs produces a better 

grammar. The new class Vki = {vk\vi} (7), k,l > 0. removes 
200 entries with Vk,vi of total length 3600, and adds one entry of 
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length 25 



{Vki}{nj}{action}{Vki}{object}{nj} 
Vkl = {vk\vi} 



Notice that merging another verb with kick would save only 199 
rules, so it will not be done in the initial stages of the application 
of the algorithm. 

Step 4. The reader may check that this step would not re- 
duce the size of the grammar. (This is due to the corpus being so 
simple, and without substructures worth encoding). 

Step 5. The successive merges of Vi's {i > ) will produce 
the following grammar: 

Grammar Gy(^iy. 



What happens next depends on whether our algorithm is very 
greedy; namely, whether we insist that all instances of the merging 

classes are replaced by the result of the merge. If that is the case, 
we cannot do the merge V{0) = {V{l)\v()}, and we will do the 
merge of the nouns. These merges will produce 



V{l) = {vi\ ... \vg} 

{V (1)} {no} {action} {V{1)} {object} {no} 
{V {l)}{ni} {action} {V{1)} {oh ject}{ni} 



(21) 

(18) 
(18) 
(18) 
(18) 
(18) 
(18) 
(18) 
(18) 



{V{l)}{ngg}{action}{V{l)}{object}{ngQ} 
{vq} {no} {action} {vq} {object} {nil} 
{vo}{ni} {action} {vo}{object}{ni} 



{vo}{ngg} {action} {vo}{object}{ngg} 
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Grammar Gv(i)n{i)' 



V{1) = {Vi\ ... \vg} 
N{1) = {ml ... Ingg} 

{V{l)}{no}{action}{V{l)}{object}{no} 

{V{l)}{N{l)}{action}{V{l)}{object}{N{l)} 
{vq} {no} {action} {vq} {object} {nil} 
{vo}{N{l)}{action}{vo}{object}{N{l)} 



(21) 
(201) 
(18) 

(18) 
(18) 
(18) 



This is our final grammar (Step 6) (if the algorithm is very 
greedy). We can see that it is much smaller than the original 
grammar — its total length is less than 300 symbols (vs. 18,000); 
but it assumes an existence of a language generator. Interestingly, 
the grammar resembles the compositional semantics, as usually 
given. The rule with V{1) and N{1) describes the compositional 
part of the corpus; the rule with vq and uq - the idiomatic; other 
rules are in between. 



3.4 Variations on the MDL algorithm 

A similar result is obtained when we do not insist that all instances 
of merging classes are replaced by the result of the merge. Starting 
with the grammar 

Grammar Gy(i): 



V{l) = {vi\ ... |^;9} 

{Vil)}{no}{action}{V{l)}{object}{no} 
{V{l)}{ni} {action} {V{l)}{object}{ni} 



(21) 
(18) 
(18) 



{V{l)}{ngg} {action} {V{l)}{object}{nQg} 
{vq} {no} {action} {vq} {object} {nil} 
{vo}{ni} {action} {vo}{object}{ni} 



(18) 
(18) 
(18) 
(18) 
(18) 



{vo}{ngg} {action} {vo}{object}{ngg} 
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We can see that the merge V{0) = {vo\V{l)} will decrease the size 
of the grammar by 99 rules and result in: 

Grammar Gy(o): 



V{l) = {vi\ ...\vs} (21) 

V{0) = {vo\V{l)} (7) 

{V{l)}{no}{action}{V{l)}{object}{no} (18) 

{V{0)}{ni} {action} {V{0)}{object}{ni} (18) 

{ViO)}{ngQ} {action} {ViO)}{ob J ect}{ngg} (18) 

{vq} {no} {action} {vq} {object} {nil} (18) 



The successive merging of nouns will then produce 

Grammar Gv{o)n{i)- 

V{0) = {vo\V{l)} (7) 

V{1) = {Vi\ ... \vg} (21) 

N{1) = {ml ... Ingg} (201) 

{V{0)}{N{l)}{action}{V{0)}{object}{N{l)} (18) 

{V{l)}{no}{action}{V{l)}{object}{no} (18) 

{vq} {no} {action} {vq} {object} {nil} (18) 



If, however we do not do the V{0) = {vo\V{l)} merge, and 
proceed with the merging of the nouns (e.g. if there were reasons 
to modify the algorithm), we get: 
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Grammar Gv(i)n{o)- 

V{1) = {v,\ ... \vg} (21) 

Ar(l) = {ni| ... Ingg} (201) 

N{0) = {no\N{l)} (7) 
{V{l)}{N{0)}{action}{V{l)}{object}{N{0)} (18) 

{vo}{N{l)} {action} {vo} {oh ject}{N{l)} (18) 

{vo} {no} {action} {vo} {object} {nil} (18) 



Finally, if we allow some overgeneralization, we can replace the 
above grammars with an even shorter grammar: 

Grammar Gv(o)N(o) ■ 

V = {vol ... \vg} (23) 

N = {no\ ... |ngg} (203) 

{V}{N}{action}{V}{object}{N} (18) 

{vo} {no} {action} {vq} {object} {nil} (18) 

Here, clearly vo is the idiomatic element. However, both idiomatic 
and non-idiomatic reading of kick bucket is allowed. (In the pre- 
viously defined grammars, we can also see the distinction between 
the idiomatic and non-idiomatic elements). 

4 A non- vacuous definition of composition- 
ality 

The fact that that the MDL principle can produce an object re- 
sembling a compositional semantics is crucial. It allows us to argue 
for a non-vacuous definition of compositionality. 

Assume that we have a corpus S of sentences and their parts, 
given either as a set or generated by a grammar. Let sentences 
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and their parts be collections of symbols put together by some 
operations; in the simplest and most important case, by concate- 
nation 

Definition. A meaning function fj, is a compositional semantics 
for the set S if its domain is contained in S, and 

a. it satisfies the postulate of compositionality: for all s, t in its 
domain: 

^(si) = n{s) e^(t) 

b. it is the shortest, in the sense of the Minimum Description 
Length principle, such an encoding. 

c. it is maximal, i.e. there is no fi' with a larger domain that 
satisfies a and b. 

To see better what this definition entails, let us consider our 
semantic corpus again. The set S consists of the 10 verbs and 
100 nouns and all noun- verb combinations. The compositional 
function fi assigns to each word its category e.g. [nn, noun]. 
The question is how to define the operator ©. Because of the 
idiom, it cannot be a total function; hence we have to exclude 
from the domain of the pair [[vq, verb], [no, noun]]. The short- 
est description of © can be given by translating the grammar of 
Section 3.2. First, map non-idiomatic verbs and nouns into pairs 
Kvi) = [vi, verbnonid], l^{nj) = [nj, nounnonid], hj > 0. Then, 
put 

®{[[v, verbnonid], [n, nouunonid]]) = [action.v, object.n] 

Thus defined and ® correspond to the grammar obtained by the 
algorithm of Section 3.2 and to the tables of Section 2. This cor- 
respondence is not exact, because functions ^ and O encode only 
the systematic, compositional part of the corpus. (But please note 
this clear distinction between the idiomatic and the compositional 
parts of the lexicon and the corpus). 



17 



However this description of the two functions is not maximal. 
We obtain the maximal compositional semantics for S by extend- 
ing the above defined mapping to all nouns fJ^{nj) = [rij, noun], 
j > 0, and extending the domain of © 

©([[f, verbnonid], [^j noun]]) = [action.v, object.n] 

It is easily checked that this is the shortest (in the sense of the 
MDL) and maximal assignment of meaning to the elements of set 
5.0 Please compare this mapping with G'y(i)^(o)) ^'^d also note 
that now we have a formal basis for saying that (for this corpus) 
it is the verb kick, and not the noun bucket, that is idiomatic. 

What are the advantages of defining compositionality using 
the Minimum Description Length principle? 1. It brings us back 
to the original definition of compositionality, but makes it non- 
vacuous. 2. It encodes the postulate that the meaning functions 
should be simple. 3. It allows us to distinguish between composi- 
tional and non-compositional semantics by means of systematicity, 
i.e. the minimality of encodings, as e.g. Hirst |^ wanted. 4. It 
does not make a reference to non-intrinsic properties of mean- 
ing functions (like being a polynomial). 5. It works for different 
models of language understanding: pipeline (syntax, semantics, 
pragmatics), construction grammars (cf. [^), and even semantic 
grammars. 6. It allows us to compare different meaning functions 
with respect to how compositional they are — we can measure 
the size of their domains and the length of the encodings. Finally, 
this definition might even satisfy those philosophers of language 
who regard compositionality not as a formal property but as an 
unattainable ideal worth striving for. This hope is based on the 
fact that, given an appropriately rich model of language, its min- 
imum description length is, in general, non-computable, and can 

*We are assuming that we have to assign the noun and verb categories to 
the lexical symbols of the corpus. 
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only be approximated but never exactly computed. 

5 Discussion and Conclusions 

Lambdas, approximations, and the minimum description 
length 

Assuming that we have a A-expressions interpreter (e.g. a lisp 
program), we could describe the meaning functions of Section 3 
as: 

\X.[noun, X] 
XY.[verh, Y] 

\[verh^ Y\ [noun, X].[[action, Y\, [object, X\\ 

\[verh, kick] [noun, bucket], faction, die], [object, niV^ 

The approximate total size of this description is size{X— interpreter) 
+ 66 (the above definitions) + 110 (to describe the domains of the 
first two functions). 

Clearly, the last lambda expression corresponds to an idiomatic 
meaning. But, note that this definition assigns also the non- 
idiomatic meaning to "kick bucket" . Thus, although much simpler, 
it does not exactly correspond to the original meaning function. 
It does however correspond to grammar Gv(o)N(o) the previous 
section. Also, representations that ignore exceptions are more of- 
ten found in the literature. This point may be worth pursuing: 
Savitch in argues that approximate representation in a more 
expressive language can be more compact. For approximate rep- 
resentations that over generalize, the idiomaticity of an expression 
can be defined as the existence of a more specific definition of its 
meaning. 

Bridging linguistic and probabilistic approaches to natu- 
ral language 

The relationship between linguistics principles and the MDL 
method is not completely surprising. We used the MDL principle 
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in 1 13] to argue for a construction-based approach to language un- 
derstanding (cf. Ip). After setting up a formal model based on 
linguistic and computational evidence, we applied the MDL prin- 
ciple to prove that construction-based representations are at least 
an order of magnitude more compact that the corresponding lexi- 
calized representations of the same linguistic data. The argument 
presented there suggests that in building compositional semantics 
we might be better off when the language is build by means of 
reach combinatorics (constructions), than by the concatenation of 
lexical items. However, this hypothesis remains to be proved. 



It is known that the most important rules of statistical rea- 
soning, the maximum likelihood method, the maximum entropy 
method, the Bayes rule and the minimum description length, are 
all closely related (cf. pp. 275-321 of Q). From the material 
of Sections 3 and 4 we can see that compositionality is closely 
related to the MDL principle; thus, it is possible to imagine bring- 
ing together linguistic and statistical methods for natural language 
understanding. For example, starting with semantic classes of Q 
continue derivation of semantic model for a large corpus using the 
method of Section 3 with the computational implementation along 
the lines of 

Conclusion 

We have redefined the linguistic concept of compositionality as 
the simplest maximal description of data that satisfies the postu- 
late that the meaning of the whole is a function of the meaning 
of its parts. By justifying compositionality by the minimum de- 
scription length principle, we have placed the intuitive idea that 
the meaning of a sentence is a combination of the meanings of its 
constituents on a firm mathematical foundation. 

This new, non-vacuous definition of compositionality is intu- 
itive and allows us to distinguish between compositional and non- 
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compositional semantics, and between idiomatic and non-idiomatic 
expressions. It is not ad hoc, since it does not make any references 
to non-intrinsic properties of meaning functions (like being a poly- 
nomial) . It works for different models of language understanding. 
Moreover, it allows us to compare different meaning functions with 
respect to how compositional they are. 

Finally, because of the close relationship between the mini- 
mum description length principle and probability, the approach 
proposed in this chapter should bridge logic-based and statistics- 
based approaches to language understanding. 
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